Using nested data frames to scale analyses
The orders data set is a good one to introduce the concepts of nested data frames and list-columns. These features in R help with automating analyses and keeping outputs together.
A nested data frame is one that includes a list-column of data frames. In this format, each row is a meta-observation, meaning the data are held in columns that define the observation and one or more list-columns of data frames that hold the individual data components. This may be easier to understand by comparing our orders data set in a nested and unnested format.
The orders data set has 45002 rows of 20 variables. So far, we’ve been exploring the data set as a whole. However, we’ve seen that there are differences among some facets of the data, such as department. It may be useful then to group the data by department and perform the same analyses for each department. This can be done by copying and pasting code and substituting department name for each of the 20 departments. This may be manageable for small numbers of groups, BUT, this is the 201 R class, so we’re going to learn how to use R to automate this type of work, so you can scale such analyses easily. We’ll see how data in nested or list-column format combined with the map functions we learned about earlier is well designed for this.
The syntax is relatively straightforward and similar to what we’ve seen previously when we grouped by variables and summarized. We first group our data by a selected variable or variables and pass this grouped data frame to nest(). Let’s see what this nested data frame looks like.
orders_nest <- orders %>%
group_by(department) %>%
nest()
## group_by: one grouping variable (department)
You should see two columns. One for the department and the second is a list-column that contains all the data for this department. Since there are 20 unique departments, there are 20 rows in the nested data frame.

We can access individual components from our nested data frame using [[]] or $ selector notation.
orders_nest$data # all 20 data frames in the data column
## [[1]]
## # A tibble: 6,330 x 18
## order_id patient_id description proc_code order_class_c_d… lab_status_c
## <dbl> <dbl> <fct> <fct> <fct> <dbl>
## 1 19766 511388 PROTHROMBI… PRO Normal NA
## 2 88444 511388 BASIC META… BMP Normal NA
## 3 50728 501184 COMPREHENS… COMP Normal NA
## 4 91635 501184 CBC (HEMOG… CBC Normal NA
## 5 23789 507392 CHR PAIN D… UCPD1B Normal 3
## 6 17359 513008 CULTURE:VI… VCIR Normal NA
## 7 22570 501142 CHR PAIN D… UCPD1B Normal 3
## 8 51714 513163 URIC ACID,… URIC Normal 3
## 9 31718 513163 BASIC META… BMP Normal 3
## 10 73740 513163 CBC, DIFF CBD Normal 3
## # … with 6,320 more rows, and 12 more variables: lab_status_c_descr <fct>,
## # order_status_c <dbl>, order_status_c_descr <fct>, reason_for_canc_c <dbl>,
## # reason_for_canc_c_descr <fct>, order_time <dttm>, result_time <dttm>,
## # review_time <dttm>, ordering_route <fct>, pref_list_type <fct>,
## # order_month <dbl>, order_week <dbl>
##
## [[2]]
## # A tibble: 1,486 x 18
## order_id patient_id description proc_code order_class_c_d… lab_status_c
## <dbl> <dbl> <fct> <fct> <fct> <dbl>
## 1 40477 508061 THYROID ST… TSH Normal 3
## 2 97641 508061 T4, FREE T4FR Normal 3
## 3 75867 504805 GLUCOSE, W… 82962 On Site 3
## 4 75528 510431 THYROID ST… TSH Normal 3
## 5 98672 510431 T4, FREE T4FR Normal 3
## 6 84224 511065 GLUCOSE, W… 82962 On Site 3
## 7 80303 507712 GLUCOSE, W… 82962 On Site 3
## 8 77270 508305 ANTI THYRO… ATPO2 Normal NA
## 9 59682 503047 GLUCOSE, W… 82962 On Site 3
## 10 95169 503047 HEMOGLOBIN… A1CRPD Normal 3
## # … with 1,476 more rows, and 12 more variables: lab_status_c_descr <fct>,
## # order_status_c <dbl>, order_status_c_descr <fct>, reason_for_canc_c <dbl>,
## # reason_for_canc_c_descr <fct>, order_time <dttm>, result_time <dttm>,
## # review_time <dttm>, ordering_route <fct>, pref_list_type <fct>,
## # order_month <dbl>, order_week <dbl>
##
## [[3]]
## # A tibble: 601 x 18
## order_id patient_id description proc_code order_class_c_d… lab_status_c
## <dbl> <dbl> <fct> <fct> <fct> <dbl>
## 1 99868 505646 COMPREHENS… COMP Normal 3
## 2 31178 505646 GLUCOSE SE… GLUF Normal 3
## 3 87245 505646 HEMOGLOBIN… A1C Normal 3
## 4 50160 505646 LIPID PANEL LIPID Normal 3
## 5 99743 508791 LIPID PANEL LIPID Normal 3
## 6 93938 508791 HEMOGLOBIN… A1C Normal 3
## 7 87463 508791 COMPREHENS… COMP Normal 3
## 8 78615 510909 APOLIPOPRO… APOB Normal 3
## 9 64333 510909 THYROID ST… TSH Normal 3
## 10 61621 510909 HEMOGLOBIN… A1C Normal 3
## # … with 591 more rows, and 12 more variables: lab_status_c_descr <fct>,
## # order_status_c <dbl>, order_status_c_descr <fct>, reason_for_canc_c <dbl>,
## # reason_for_canc_c_descr <fct>, order_time <dttm>, result_time <dttm>,
## # review_time <dttm>, ordering_route <fct>, pref_list_type <fct>,
## # order_month <dbl>, order_week <dbl>
##
## [[4]]
## # A tibble: 1,026 x 18
## order_id patient_id description proc_code order_class_c_d… lab_status_c
## <dbl> <dbl> <fct> <fct> <fct> <dbl>
## 1 26516 511303 URIC ACID,… URIC Normal 3
## 2 47649 511303 STANDARD D… UDRSS Normal 5
## 3 10968 511303 THYROID ST… TSH Normal 3
## 4 55526 511303 BASIC META… BMP Normal 3
## 5 58870 511303 HEPATIC FU… HFPA Normal 3
## 6 45209 511303 IRON BINDI… IBCD Normal 3
## 7 34373 511303 IRON, SERUM FE Normal 5
## 8 44060 511303 B_TYPE NAT… BNAP Normal 3
## 9 23368 511303 URINALYSIS… UAC Normal 3
## 10 95109 511303 FERRITIN FER Normal 3
## # … with 1,016 more rows, and 12 more variables: lab_status_c_descr <fct>,
## # order_status_c <dbl>, order_status_c_descr <fct>, reason_for_canc_c <dbl>,
## # reason_for_canc_c_descr <fct>, order_time <dttm>, result_time <dttm>,
## # review_time <dttm>, ordering_route <fct>, pref_list_type <fct>,
## # order_month <dbl>, order_week <dbl>
##
## [[5]]
## # A tibble: 2,093 x 18
## order_id patient_id description proc_code order_class_c_d… lab_status_c
## <dbl> <dbl> <fct> <fct> <fct> <dbl>
## 1 79887 510902 CBC (HEMOG… CBC Normal NA
## 2 17602 507405 RENAL FUNC… RENFP Normal 3
## 3 41429 507405 RENAL FUNC… RENFP Normal 3
## 4 53695 510463 U/A NONAUT… 81002 On Site 3
## 5 29415 503167 PARATHYROI… IPTH Normal 3
## 6 49907 503167 URINE PROT… UPCRAT Normal 3
## 7 13350 503167 RENAL FUNC… RENFP Normal NA
## 8 79131 504256 MAGNESIUM MG Normal NA
## 9 74554 509817 BASIC META… BMP Normal NA
## 10 81892 509758 VITAMIN D … VITDG2 Normal 3
## # … with 2,083 more rows, and 12 more variables: lab_status_c_descr <fct>,
## # order_status_c <dbl>, order_status_c_descr <fct>, reason_for_canc_c <dbl>,
## # reason_for_canc_c_descr <fct>, order_time <dttm>, result_time <dttm>,
## # review_time <dttm>, ordering_route <fct>, pref_list_type <fct>,
## # order_month <dbl>, order_week <dbl>
##
## [[6]]
## # A tibble: 3,501 x 18
## order_id patient_id description proc_code order_class_c_d… lab_status_c
## <dbl> <dbl> <fct> <fct> <fct> <dbl>
## 1 42783 506703 HEPATITIS … HCAB Normal 3
## 2 11884 506703 HEPATITIS … HAVIGM Normal 3
## 3 98932 506703 HEPATITIS … HBSS Normal 3
## 4 83642 506703 HEPATITIS … HAVIGG Normal 3
## 5 12236 502031 U/A NONAUT… 81002 On Site 3
## 6 59945 511013 CBC, DIFF CBD Clinic Collect 3
## 7 33517 501790 U/A NONAUT… 81002 On Site NA
## 8 90317 513135 GLUCOSE, W… 82962 On Site NA
## 9 28414 513135 THYROID ST… TSH Clinic Collect 3
## 10 66783 513135 CHOLESTERO… CHOL Normal 3
## # … with 3,491 more rows, and 12 more variables: lab_status_c_descr <fct>,
## # order_status_c <dbl>, order_status_c_descr <fct>, reason_for_canc_c <dbl>,
## # reason_for_canc_c_descr <fct>, order_time <dttm>, result_time <dttm>,
## # review_time <dttm>, ordering_route <fct>, pref_list_type <fct>,
## # order_month <dbl>, order_week <dbl>
##
## [[7]]
## # A tibble: 3,128 x 18
## order_id patient_id description proc_code order_class_c_d… lab_status_c
## <dbl> <dbl> <fct> <fct> <fct> <dbl>
## 1 22780 503847 HEMOGLOBIN… A1C Clinic Collect 3
## 2 24316 503847 BASIC META… BMP Clinic Collect 3
## 3 24503 503847 LIPID PANEL LIPID Clinic Collect 3
## 4 97595 506119 CBC, DIFF CBD Clinic Collect 3
## 5 15073 506119 BASIC META… BMP Clinic Collect 3
## 6 78705 503144 CBC, DIFF CBD Clinic Collect 3
## 7 91697 503144 RENAL FUNC… RENFP Clinic Collect 3
## 8 99656 508435 GLUCOSE, W… 82962 On Site 3
## 9 24085 511453 U/A NONAUT… 81002 On Site 3
## 10 90665 511453 GLUCOSE, W… 82962 On Site 3
## # … with 3,118 more rows, and 12 more variables: lab_status_c_descr <fct>,
## # order_status_c <dbl>, order_status_c_descr <fct>, reason_for_canc_c <dbl>,
## # reason_for_canc_c_descr <fct>, order_time <dttm>, result_time <dttm>,
## # review_time <dttm>, ordering_route <fct>, pref_list_type <fct>,
## # order_month <dbl>, order_week <dbl>
##
## [[8]]
## # A tibble: 503 x 18
## order_id patient_id description proc_code order_class_c_d… lab_status_c
## <dbl> <dbl> <fct> <fct> <fct> <dbl>
## 1 99523 510460 PROTHROMBI… PRO Normal 3
## 2 26411 501568 GROUP A ST… BSARD Normal 3
## 3 14980 501568 R/O BETA S… BSC Normal 3
## 4 33230 501568 LAB ADD ON… LADDON Normal 3
## 5 34386 511328 HEMOGLOBIN… A1C Normal 3
## 6 63108 511328 LIPID PANEL LIPID Normal 3
## 7 94945 511328 URINE SCRE… UMALSP Normal 3
## 8 10561 511340 FERRITIN FER Normal 3
## 9 86003 511340 IRON BINDI… IBCD Normal 3
## 10 88863 511340 CBC (HEMOG… CBC Normal 3
## # … with 493 more rows, and 12 more variables: lab_status_c_descr <fct>,
## # order_status_c <dbl>, order_status_c_descr <fct>, reason_for_canc_c <dbl>,
## # reason_for_canc_c_descr <fct>, order_time <dttm>, result_time <dttm>,
## # review_time <dttm>, ordering_route <fct>, pref_list_type <fct>,
## # order_month <dbl>, order_week <dbl>
##
## [[9]]
## # A tibble: 503 x 18
## order_id patient_id description proc_code order_class_c_d… lab_status_c
## <dbl> <dbl> <fct> <fct> <fct> <dbl>
## 1 83813 504593 CHR PAIN D… UCPD1B Clinic Collect 3
## 2 66540 502678 CHR PAIN D… UCPD1B Clinic Collect 3
## 3 81501 501080 CHR PAIN D… UCPD1B Clinic Collect 3
## 4 57781 508385 CHR PAIN D… UCPD1B Clinic Collect 3
## 5 27831 509043 CHR PAIN D… UCPD1B Clinic Collect 3
## 6 12660 513930 CHR PAIN D… UCPD1B Clinic Collect 3
## 7 73489 504215 CHR PAIN D… UCPD1B Clinic Collect 3
## 8 30170 503959 OPIOID CON… UOPIAC Clinic Collect 3
## 9 55092 510557 CHR PAIN D… UCPD1B Clinic Collect 3
## 10 78907 512761 CHR PAIN D… UCPD1B Clinic Collect 3
## # … with 493 more rows, and 12 more variables: lab_status_c_descr <fct>,
## # order_status_c <dbl>, order_status_c_descr <fct>, reason_for_canc_c <dbl>,
## # reason_for_canc_c_descr <fct>, order_time <dttm>, result_time <dttm>,
## # review_time <dttm>, ordering_route <fct>, pref_list_type <fct>,
## # order_month <dbl>, order_week <dbl>
##
## [[10]]
## # A tibble: 2,211 x 18
## order_id patient_id description proc_code order_class_c_d… lab_status_c
## <dbl> <dbl> <fct> <fct> <fct> <dbl>
## 1 61321 512524 HEPATIC FU… HFPA Normal 1
## 2 63568 513876 COMPREHENS… COMP Normal 3
## 3 29849 513876 HEPATITIS … HCVQNT Normal NA
## 4 46095 513876 HEPATITIS … HBSS Normal NA
## 5 39759 513876 HEPATITIS … HAVIGG Normal NA
## 6 80259 513876 PROTHROMBI… PRO Normal 3
## 7 71530 513876 CBC, DIFF CBD Normal 3
## 8 37697 510569 HEPATITIS … HBSS Normal 3
## 9 14623 510569 COMPREHENS… COMP Normal 3
## 10 44648 510569 HEPATITIS … HAVIGG Normal 3
## # … with 2,201 more rows, and 12 more variables: lab_status_c_descr <fct>,
## # order_status_c <dbl>, order_status_c_descr <fct>, reason_for_canc_c <dbl>,
## # reason_for_canc_c_descr <fct>, order_time <dttm>, result_time <dttm>,
## # review_time <dttm>, ordering_route <fct>, pref_list_type <fct>,
## # order_month <dbl>, order_week <dbl>
##
## [[11]]
## # A tibble: 2,179 x 18
## order_id patient_id description proc_code order_class_c_d… lab_status_c
## <dbl> <dbl> <fct> <fct> <fct> <dbl>
## 1 55347 510095 URINE PREG… 81025 On Site 3
## 2 27773 511000 URINE PREG… 81025 On Site 3
## 3 43511 511000 PATHOLOGY,… SURG Clinic Collect 3
## 4 80696 501931 WET MOUNTS… Q0111 On Site 3
## 5 21481 501931 R/O YEAST … YSTF Clinic Collect 3
## 6 22998 510095 PATHOLOGY,… SURG Clinic Collect 3
## 7 47686 502539 PREGNANCY … UPG Normal 3
## 8 11078 506835 17-HYDROXY… OHPROG Normal 3
## 9 81410 506835 DHEA SULFA… DHEAS Normal 3
## 10 51073 506835 TESTOSTERO… FTTEST Normal 3
## # … with 2,169 more rows, and 12 more variables: lab_status_c_descr <fct>,
## # order_status_c <dbl>, order_status_c_descr <fct>, reason_for_canc_c <dbl>,
## # reason_for_canc_c_descr <fct>, order_time <dttm>, result_time <dttm>,
## # review_time <dttm>, ordering_route <fct>, pref_list_type <fct>,
## # order_month <dbl>, order_week <dbl>
##
## [[12]]
## # A tibble: 724 x 18
## order_id patient_id description proc_code order_class_c_d… lab_status_c
## <dbl> <dbl> <fct> <fct> <fct> <dbl>
## 1 86858 510501 BASIC META… BMP Normal 3
## 2 53879 510501 URINALYSIS… UACRC Normal 3
## 3 54707 510501 CBC (HEMOG… CBC Normal 3
## 4 17821 510501 PROTHROMBI… PRO Normal 3
## 5 55146 510501 PARTIAL TH… PTT Normal 3
## 6 68140 511149 CRP, HIGH … HSCRP Normal 3
## 7 23688 511149 CBC (HEMOG… CBC Normal 3
## 8 82147 511149 SED RATE ESR Normal 3
## 9 77846 504561 URINALYSIS… UACRC Normal 3
## 10 95958 504561 BASIC META… BMP Normal 3
## # … with 714 more rows, and 12 more variables: lab_status_c_descr <fct>,
## # order_status_c <dbl>, order_status_c_descr <fct>, reason_for_canc_c <dbl>,
## # reason_for_canc_c_descr <fct>, order_time <dttm>, result_time <dttm>,
## # review_time <dttm>, ordering_route <fct>, pref_list_type <fct>,
## # order_month <dbl>, order_week <dbl>
##
## [[13]]
## # A tibble: 11,861 x 18
## order_id patient_id description proc_code order_class_c_d… lab_status_c
## <dbl> <dbl> <fct> <fct> <fct> <dbl>
## 1 48260 508653 COMPREHENS… COMP Normal 3
## 2 53921 508653 CBC, DIFF CBD Normal 3
## 3 58381 508653 SEROLOGIC … SYPHS Normal 5
## 4 17124 508653 GC&CHLAM N… GCCTAD Normal 3
## 5 98089 508653 GC&CHLAM N… GCCTAD Normal 3
## 6 87763 508653 GC&CHLAM N… GCCTAD Normal 3
## 7 22841 511004 HIV ANTIGE… HVAGAB Normal NA
## 8 74600 511004 HEPATITIS … HBSA Normal NA
## 9 61276 505483 BASIC META… BMP Normal NA
## 10 79148 505483 ANTI TOXOP… TOXOG Normal NA
## # … with 11,851 more rows, and 12 more variables: lab_status_c_descr <fct>,
## # order_status_c <dbl>, order_status_c_descr <fct>, reason_for_canc_c <dbl>,
## # reason_for_canc_c_descr <fct>, order_time <dttm>, result_time <dttm>,
## # review_time <dttm>, ordering_route <fct>, pref_list_type <fct>,
## # order_month <dbl>, order_week <dbl>
##
## [[14]]
## # A tibble: 1,015 x 18
## order_id patient_id description proc_code order_class_c_d… lab_status_c
## <dbl> <dbl> <fct> <fct> <fct> <dbl>
## 1 41411 506437 PTH-RELATE… RPTHRP Normal 3
## 2 65645 506437 VITAMIN D … VITDG2 Normal 3
## 3 64871 506437 HEMOGLOBIN… A1C Normal 3
## 4 57461 508851 BASIC META… BMP Normal 3
## 5 32532 512508 LIPID PANEL LIPID Normal 3
## 6 44536 512508 THYROID ST… TSH Normal 3
## 7 20962 512948 CHRONIC PA… UCPDS Normal NA
## 8 35222 512508 CHR PAIN D… UCPD1B Normal 3
## 9 94655 500485 THYROID ST… TSH Normal 3
## 10 63171 500485 HEMOGLOBIN… A1C Normal 3
## # … with 1,005 more rows, and 12 more variables: lab_status_c_descr <fct>,
## # order_status_c <dbl>, order_status_c_descr <fct>, reason_for_canc_c <dbl>,
## # reason_for_canc_c_descr <fct>, order_time <dttm>, result_time <dttm>,
## # review_time <dttm>, ordering_route <fct>, pref_list_type <fct>,
## # order_month <dbl>, order_week <dbl>
##
## [[15]]
## # A tibble: 2,499 x 18
## order_id patient_id description proc_code order_class_c_d… lab_status_c
## <dbl> <dbl> <fct> <fct> <fct> <dbl>
## 1 50012 501660 LIPID PANEL LIPID Normal 3
## 2 38223 501660 CBC, DIFF CBD Normal 3
## 3 36299 501660 URIC ACID,… URIC Normal 3
## 4 47819 501660 COMPREHENS… COMP Normal 3
## 5 66313 501660 VITAMIN D … VITDG2 Normal 3
## 6 37682 501203 VITAMIN D … VITDG2 Normal 3
## 7 43926 501203 COMPREHENS… COMP Normal 3
## 8 11326 501203 LIPID PANEL LIPID Normal 3
## 9 67843 501203 CBC, DIFF CBD Normal 3
## 10 30756 502138 CBC, DIFF CBD Normal 3
## # … with 2,489 more rows, and 12 more variables: lab_status_c_descr <fct>,
## # order_status_c <dbl>, order_status_c_descr <fct>, reason_for_canc_c <dbl>,
## # reason_for_canc_c_descr <fct>, order_time <dttm>, result_time <dttm>,
## # review_time <dttm>, ordering_route <fct>, pref_list_type <fct>,
## # order_month <dbl>, order_week <dbl>
##
## [[16]]
## # A tibble: 902 x 18
## order_id patient_id description proc_code order_class_c_d… lab_status_c
## <dbl> <dbl> <fct> <fct> <fct> <dbl>
## 1 24866 503976 URINE PREG… 81025 On Site 3
## 2 44996 510987 VITAMIN D … VITDG2 Normal 3
## 3 90492 510987 BASIC META… BMP Normal 3
## 4 73371 510987 HEMOGLOBIN… A1C Normal 3
## 5 81208 510987 CBC, DIFF CBD Normal 3
## 6 11015 510814 VITAMIN D … VITDG2 Normal 3
## 7 35768 510814 CBC, DIFF CBD Normal 3
## 8 98020 510814 ZINC PROTO… ZPPH Normal 3
## 9 75804 510814 LEAD PB Normal 3
## 10 97951 510814 HIV ANTIGE… HVAGAB Normal 3
## # … with 892 more rows, and 12 more variables: lab_status_c_descr <fct>,
## # order_status_c <dbl>, order_status_c_descr <fct>, reason_for_canc_c <dbl>,
## # reason_for_canc_c_descr <fct>, order_time <dttm>, result_time <dttm>,
## # review_time <dttm>, ordering_route <fct>, pref_list_type <fct>,
## # order_month <dbl>, order_week <dbl>
##
## [[17]]
## # A tibble: 781 x 18
## order_id patient_id description proc_code order_class_c_d… lab_status_c
## <dbl> <dbl> <fct> <fct> <fct> <dbl>
## 1 50337 508808 CBC, DIFF CBD Normal NA
## 2 35868 508808 TYPE AND S… TSCR Normal NA
## 3 98477 505878 HEPATITIS … HAVIGG Normal 3
## 4 14470 505878 HEPATITIS … HBCA Normal 3
## 5 53160 505878 HEPATITIS … HBSA Normal 3
## 6 49114 509403 HEPATIC FU… HFPA Normal NA
## 7 10492 508361 HEPATITIS … HBSA Normal 3
## 8 19794 508361 HEPATITIS … HBCA Normal 3
## 9 12509 508361 HEPATITIS … HAVIGG Normal 3
## 10 46062 503197 CRP, HIGH … HSCRP Normal NA
## # … with 771 more rows, and 12 more variables: lab_status_c_descr <fct>,
## # order_status_c <dbl>, order_status_c_descr <fct>, reason_for_canc_c <dbl>,
## # reason_for_canc_c_descr <fct>, order_time <dttm>, result_time <dttm>,
## # review_time <dttm>, ordering_route <fct>, pref_list_type <fct>,
## # order_month <dbl>, order_week <dbl>
##
## [[18]]
## # A tibble: 657 x 18
## order_id patient_id description proc_code order_class_c_d… lab_status_c
## <dbl> <dbl> <fct> <fct> <fct> <dbl>
## 1 66536 506248 COMPREHENS… COMP Normal 3
## 2 12174 506248 CBC, DIFF CBD Normal 3
## 3 66374 509724 CBC, DIFF CBD Normal 3
## 4 97052 509724 COMPREHENS… COMP Normal 3
## 5 91016 508759 CBC (HEMOG… CBC Normal 3
## 6 78403 508759 COMPREHENS… COMP Normal 3
## 7 45997 509823 CBC, DIFF CBD Normal 3
## 8 78835 509823 COMPREHENS… COMP Normal 3
## 9 89187 505974 MITOCHONDR… MITPAN Normal 3
## 10 96199 509444 LYME DISEA… RLYMD Normal 3
## # … with 647 more rows, and 12 more variables: lab_status_c_descr <fct>,
## # order_status_c <dbl>, order_status_c_descr <fct>, reason_for_canc_c <dbl>,
## # reason_for_canc_c_descr <fct>, order_time <dttm>, result_time <dttm>,
## # review_time <dttm>, ordering_route <fct>, pref_list_type <fct>,
## # order_month <dbl>, order_week <dbl>
##
## [[19]]
## # A tibble: 2,422 x 18
## order_id patient_id description proc_code order_class_c_d… lab_status_c
## <dbl> <dbl> <fct> <fct> <fct> <dbl>
## 1 39794 508604 CELL COUNT… FCCNT Normal 3
## 2 73174 507693 URINALYSIS… UAC Normal 3
## 3 71219 507693 URINE C/S URNXC Normal NA
## 4 82705 513123 SED RATE ESR Normal NA
## 5 77427 513123 COMPREHENS… COMP Normal NA
## 6 83585 513123 CBC, DIFF CBD Normal NA
## 7 33707 513123 CRP, HIGH … HSCRP Normal NA
## 8 38398 510005 CRP, HIGH … HSCRP Normal 3
## 9 57991 510005 CBC, DIFF CBD Normal 3
## 10 62417 510005 COMPREHENS… COMP Normal 3
## # … with 2,412 more rows, and 12 more variables: lab_status_c_descr <fct>,
## # order_status_c <dbl>, order_status_c_descr <fct>, reason_for_canc_c <dbl>,
## # reason_for_canc_c_descr <fct>, order_time <dttm>, result_time <dttm>,
## # review_time <dttm>, ordering_route <fct>, pref_list_type <fct>,
## # order_month <dbl>, order_week <dbl>
##
## [[20]]
## # A tibble: 580 x 18
## order_id patient_id description proc_code order_class_c_d… lab_status_c
## <dbl> <dbl> <fct> <fct> <fct> <dbl>
## 1 73273 501740 TACROLIMUS… TACROG Normal 3
## 2 56230 501740 COMPREHENS… COMP Normal 3
## 3 88922 501740 CBC, DIFF CBD Normal 3
## 4 50170 501740 MAGNESIUM MG Normal 3
## 5 13215 513015 OCCULT BLO… SOCULT Normal NA
## 6 40011 513015 CBC, DIFF CBD Normal 3
## 7 62529 513015 COMPREHENS… COMP Normal 3
## 8 47408 513015 PROTEIN EL… ELPP Normal 3
## 9 47865 513015 FIBRINOGEN FIBCL Normal 3
## 10 13639 513015 CBC, DIFF CBD Normal 3
## # … with 570 more rows, and 12 more variables: lab_status_c_descr <fct>,
## # order_status_c <dbl>, order_status_c_descr <fct>, reason_for_canc_c <dbl>,
## # reason_for_canc_c_descr <fct>, order_time <dttm>, result_time <dttm>,
## # review_time <dttm>, ordering_route <fct>, pref_list_type <fct>,
## # order_month <dbl>, order_week <dbl>
orders_nest[[2]][[1]] # data frame for first location
## # A tibble: 6,330 x 18
## order_id patient_id description proc_code order_class_c_d… lab_status_c
## <dbl> <dbl> <fct> <fct> <fct> <dbl>
## 1 19766 511388 PROTHROMBI… PRO Normal NA
## 2 88444 511388 BASIC META… BMP Normal NA
## 3 50728 501184 COMPREHENS… COMP Normal NA
## 4 91635 501184 CBC (HEMOG… CBC Normal NA
## 5 23789 507392 CHR PAIN D… UCPD1B Normal 3
## 6 17359 513008 CULTURE:VI… VCIR Normal NA
## 7 22570 501142 CHR PAIN D… UCPD1B Normal 3
## 8 51714 513163 URIC ACID,… URIC Normal 3
## 9 31718 513163 BASIC META… BMP Normal 3
## 10 73740 513163 CBC, DIFF CBD Normal 3
## # … with 6,320 more rows, and 12 more variables: lab_status_c_descr <fct>,
## # order_status_c <dbl>, order_status_c_descr <fct>, reason_for_canc_c <dbl>,
## # reason_for_canc_c_descr <fct>, order_time <dttm>, result_time <dttm>,
## # review_time <dttm>, ordering_route <fct>, pref_list_type <fct>,
## # order_month <dbl>, order_week <dbl>
orders_nest$data[[1]] # data frame for the first location
## # A tibble: 6,330 x 18
## order_id patient_id description proc_code order_class_c_d… lab_status_c
## <dbl> <dbl> <fct> <fct> <fct> <dbl>
## 1 19766 511388 PROTHROMBI… PRO Normal NA
## 2 88444 511388 BASIC META… BMP Normal NA
## 3 50728 501184 COMPREHENS… COMP Normal NA
## 4 91635 501184 CBC (HEMOG… CBC Normal NA
## 5 23789 507392 CHR PAIN D… UCPD1B Normal 3
## 6 17359 513008 CULTURE:VI… VCIR Normal NA
## 7 22570 501142 CHR PAIN D… UCPD1B Normal 3
## 8 51714 513163 URIC ACID,… URIC Normal 3
## 9 31718 513163 BASIC META… BMP Normal 3
## 10 73740 513163 CBC, DIFF CBD Normal 3
## # … with 6,320 more rows, and 12 more variables: lab_status_c_descr <fct>,
## # order_status_c <dbl>, order_status_c_descr <fct>, reason_for_canc_c <dbl>,
## # reason_for_canc_c_descr <fct>, order_time <dttm>, result_time <dttm>,
## # review_time <dttm>, ordering_route <fct>, pref_list_type <fct>,
## # order_month <dbl>, order_week <dbl>
To go back to a flat data frame, we use unnest() and specify which column we want to unnest, in this case it is the data column.
orders_unnested <- orders_nest %>% unnest(data) # looks just like orders
Back to our nested data frame, we now have data separated in a way that makes it easy to apply the same analysis across the data for all departments individually. Similar to previous lessons using summarize() and the different map() functions, we need a function to apply to our data. In the previous cases, we used existing functions in R: mean(), read_excel(). Here, we will review how to create our own function and map this across our list-column.
We saw above that there are differences in use of the discouraged provider preference lists. If we want to know how these vary by department, in more detail, we can create a table similar to those above, but for each department.
This table shows the top 10 orders most frequently ordered from a provider preference list by department and calculates the percent usage of the provider preference list for each test.
orders %>%
filter(department == "ENDOCRINOLOGY CLINIC") %>%
tabyl(description, pref_list_type) %>%
arrange(desc(`Provider Preference List`)) %>%
slice_head(n = 10) %>%
adorn_totals("row") %>%
adorn_percentages("row") %>%
adorn_pct_formatting()
## filter: removed 43,516 rows (97%), 1,486 rows remaining
## slice_head: removed 547 rows (98%), 10 rows remaining
## description Clinic Preference List None
## THYROID STIMULATING HORMONE 8.2% 12.1%
## T4, FREE 10.6% 9.3%
## HEMOGLOBIN A1C, HPLC 26.6% 5.5%
## T3 8.1% 4.1%
## LIPID PANEL 7.9% 4.8%
## TESTOSTERONE, FREE & TOTAL 6.9% 0.0%
## URINE SCREEN, MICROALBUMINURIA 50.0% 12.9%
## COMPREHENSIVE METABOLIC PANEL 32.1% 0.0%
## ESTRADIOL 9.1% 4.5%
## VITAMIN D (25 HYDROXY) 30.8% 3.8%
## Total 16.7% 7.7%
## Provider Preference List
## 79.7%
## 80.1%
## 68.0%
## 87.8%
## 87.3%
## 93.1%
## 37.1%
## 67.9%
## 86.4%
## 65.4%
## 75.6%
Great - so, we could iterate over all 20 departments manually by changing the department in the filter() call – or we can automate this process. First we will build a function that creates this table for whatever dataset is supplied to it. We make small modifications to the code above to create our function:
- We need to identify that we are writing a function - use
function(){}
- Within the () supply the name for a data argument, in this case df, as we will be passing a data frame to our function.
- Within the {} we write the code we want carried out. This should be the code from above with a change to the input data. Since we will be iterating over our list column that is specific to a department, we don’t need the filter step.
- We execute the entire code chunk to save our new function to our environment.
pro_pref_sum <- function(df) {
df %>%
tabyl(description, pref_list_type) %>%
arrange(desc(`Provider Preference List`)) %>%
slice_head(n = 10) %>%
adorn_totals("row") %>%
adorn_percentages("row") %>%
adorn_pct_formatting()
}
Now we have our user defined function ready. We can test that it performs as expected by trying on data for of the departments.
pro_pref_sum(orders_nest$data[[2]]) # look at endocrinology
## slice_head: removed 547 rows (98%), 10 rows remaining
## description Clinic Preference List None
## THYROID STIMULATING HORMONE 8.2% 12.1%
## T4, FREE 10.6% 9.3%
## HEMOGLOBIN A1C, HPLC 26.6% 5.5%
## T3 8.1% 4.1%
## LIPID PANEL 7.9% 4.8%
## TESTOSTERONE, FREE & TOTAL 6.9% 0.0%
## URINE SCREEN, MICROALBUMINURIA 50.0% 12.9%
## COMPREHENSIVE METABOLIC PANEL 32.1% 0.0%
## ESTRADIOL 9.1% 4.5%
## VITAMIN D (25 HYDROXY) 30.8% 3.8%
## Total 16.7% 7.7%
## Provider Preference List
## 79.7%
## 80.1%
## 68.0%
## 87.8%
## 87.3%
## 93.1%
## 37.1%
## 67.9%
## 86.4%
## 65.4%
## 75.6%
We will use map() to apply our function to each of our data frames in the data list column, generating a table for each department.
Remember the syntax is: map(.x, .f). We’ll create a new column to hold the data for our summary tables.
orders_nest <- orders_nest %>%
mutate(pro_pref = map(data, pro_pref_sum))
## slice_head: removed 547 rows (98%), 10 rows remaining
## slice_head: removed 547 rows (98%), 10 rows remaining
## slice_head: removed 547 rows (98%), 10 rows remaining
## slice_head: removed 547 rows (98%), 10 rows remaining
## slice_head: removed 547 rows (98%), 10 rows remaining
## slice_head: removed 547 rows (98%), 10 rows remaining
## slice_head: removed 547 rows (98%), 10 rows remaining
## slice_head: removed 547 rows (98%), 10 rows remaining
## slice_head: removed 547 rows (98%), 10 rows remaining
## slice_head: removed 547 rows (98%), 10 rows remaining
## slice_head: removed 547 rows (98%), 10 rows remaining
## slice_head: removed 547 rows (98%), 10 rows remaining
## slice_head: removed 547 rows (98%), 10 rows remaining
## slice_head: removed 547 rows (98%), 10 rows remaining
## slice_head: removed 547 rows (98%), 10 rows remaining
## slice_head: removed 547 rows (98%), 10 rows remaining
## slice_head: removed 547 rows (98%), 10 rows remaining
## slice_head: removed 547 rows (98%), 10 rows remaining
## slice_head: removed 547 rows (98%), 10 rows remaining
## slice_head: removed 547 rows (98%), 10 rows remaining
## mutate (grouped): new variable 'pro_pref' (list) with 20 unique values and 0% NA
Our data are now stored with the analyses. Hopefully you can imagine the power of this skill and the ability to easily apply functions (built-in or user-defined) across features of a data set.
Exercise 5
Let’s do an exercise to create a plot for each department to show the distribution of the result review times.
Write a function that creates a density plot for the review TAT and apply it to the data for each department. Add the plots as a new column in your nested data set.
End Exercise